23 Feb 2017

Automatically generated data

With this data we might like to:

  • Look for trends over time

With this data we might like to:

  • Compare different moments in time

With this data we might like to:

Do other time series analysis * Compare seasonality * Fit ARIMA models

But also * Difference between median

However our data looks like this

library(padr)
library(dplyr)
padr::emergency %>% head
## # A tibble: 6 × 6
##        lat       lng   zip                   title          time_stamp
##      <dbl>     <dbl> <int>                   <chr>              <dttm>
## 1 40.29788 -75.58129 19525  EMS: BACK PAINS/INJURY 2015-12-10 17:40:00
## 2 40.25806 -75.26468 19446 EMS: DIABETIC EMERGENCY 2015-12-10 17:40:00
## 3 40.12118 -75.35198 19401     Fire: GAS-ODOR/LEAK 2015-12-10 17:40:00
## 4 40.11615 -75.34351 19401  EMS: CARDIAC EMERGENCY 2015-12-10 17:40:01
## 5 40.25149 -75.60335    NA          EMS: DIZZINESS 2015-12-10 17:40:01
## 6 40.25347 -75.28324 19446        EMS: HEAD INJURY 2015-12-10 17:40:01
## # ... with 1 more variables: twp <chr>

padr helps out with two challenges

Every row is a single observation, typically on second level. You want to do analysis on a (much) higher level.

  • padr offers: thicken used in conjunction with a database package, like dplyr.
emergency %>% thicken(interval = "month") %>% 
  count(time_stamp_month) %>% head
## # A tibble: 6 × 2
##   time_stamp_month     n
##             <date> <int>
## 1       2015-12-01  7969
## 2       2016-01-01 13205
## 3       2016-02-01 11467
## 4       2016-03-01 11101
## 5       2016-04-01 11326
## 6       2016-05-01 11423

padr helps out with two challenges

When there is no observation, there is no record.

  • padr offers: pad
data.frame(dt    = as.Date(c("2017-02-23", "2017-02-26")), 
           value = c(2, 4)) %>% 
  pad
##           dt value
## 1 2017-02-23     2
## 2 2017-02-24    NA
## 3 2017-02-25    NA
## 4 2017-02-26     4